Answer Key: Data Assignment 1

Working with Panel Survey Data

Author
Affiliation

Jeremy Springman

University of Pennsylvania

Published

April 5, 2024

Part 1: Read-in data and prepare for analysis

library(ggplot2)
library(readr)
library(ggdag)
library(tidyverse)
library(gt)
library(modelsummary)

# read-in data
dat = read_csv(here::here("workshops/aau_survey/clean_endline_did.csv" )) %>%
# dat = read_csv("https://raw.githubusercontent.com/jrspringman/psci3200-globaldev/main/workshops/aau_survey/clean_endline_did.csv" ) %>%
    # clean home region variable
  mutate(q8_baseline = ifelse(q8_baseline == "Southern Nations, Nationalities, and Peoples Region", "SNNPR", q8_baseline), 
         q8_baseline = str_remove(q8_baseline, " Region"))
# create color palette for plotting
palette = MetBrewer::met.brewer(name = "Cross")

Requirement 1 (10%)

For all variables after user_language, please rename the column with a descriptive name that better conveys their meaning. Column names should never contain spaces and should be as easy to type as possible. Do this for both their baseline and endline values, making sure to indicate which columns are baseline measures and which are endline measures in the names you assign.

Part 2: Create Index Measures

Next, you’ll need to create index measures for different types of variables. We will use the two types of index methods described during the in-class workshop.

Requirement 2 (10%)

First, in your own words, explain the concept of an additive index and an averaged z-score, including how they are calculated, when you should use them, and when you cannot use them. What are the benefits of each approach?

Requirement 3 (20%)

Next, you’ll need to:

  1. Create an additive index for the baseline and endline measures of the “Future plans for a career in public sector or civil society” variables. This should correspond to seperate counts of the number of future plans that each individual has at baseline and endline.
  2. Create an averaged z-scores for the baseline and endline values of the “Future plans for a career in public sector or civil society” and “Feelings of political efficacy” variables.
Note

Since many of these are ordered categorical values, you will need to convert them to numeric values that assign higher numbers to certain values and lower numbers to certain values. Make sure that your numeric assignments correspond with the direction of the ordered categorical values.

$q17_1_baseline_st
[1] "numeric"

$q17_2_baseline_st
[1] "numeric"

$q17_3_baseline_st
[1] "numeric"

$q17_1_st
[1] "numeric"

$q17_2_st
[1] "numeric"

$q17_3_st
[1] "numeric"

Part 3: Estimating models

Now, let’s estimate some models to assess the relationship between the two index measures. Before we get started, subset your data to include only response_id, q3_baseline (which you should have renamed), and the baseline and endline measures for each z-score. You should end up with 6 variables in your dataframe.

Requirement 5 (15%)

Using baseline values only, estimate a model regressing your “Future plans” index on your “Feelings of political efficacy” index. Your model should take the following form:

\[ Future\_plans_i = \alpha + \beta_1 Efficacy_{i1} + \epsilon_i \] Use the modelsummary()package to visualize the results as a table. In your own words, interpret the meaning of \(\alpha\) and \(\beta_1\). Substantively, how should we interpret the relationship described in the data? What does this tell us about the world? What assumptions would we need in order to interpret the relationship as causal?

Requirement 6 (15%)

For your baseline and endline values of the “Feelings of political efficacy” index, convert this index to a binary indicator taking a value of 1 of the individual has a value greater than or equal to the sample mean and a value of 0 if the individual has a value below the sample mean.

Using baseline values only, estimate the same model, but interact your binary “Feelings of political efficacy” indicator with the gender indicator. Your model should take the following form:

\[ Future\_plans_i = \alpha + \beta_1 Efficacy_{i} + \beta_2 Gender_{i} + \beta_3 (Efficacy_{i}*Gender_{i}) + \epsilon_i \] Use the modelsummary()package to visualize the results as a table. In your own words, interpret the meaning of \(\alpha\), \(\beta_1\), \(\beta_2\), and \(\beta_3\). Substantively, how should we interpret the interactive relationship described in the data?

Bivariate 1  Bivariate 2  Interaction
(Intercept) 0.000 (0.017) −0.052* (0.025) −0.106* (0.045)
z_efficacy_base 0.083*** (0.024)
z_efficacy_base_bin 0.095** (0.034) 0.108+ (0.066)
q3_baselineMale 0.078 (0.055)
z_efficacy_base_bin × q3_baselineMale −0.026 (0.077)
Num.Obs. 817 817 817
R2 Adj. 0.013 0.008 0.009

Convert the data from ‘wide’ to ‘long’ format, so that each respondent (response_id) has two rows of data; one row is baseline and one row is endline. Create a Post indicator that takes a value of 1 in rows that contain endline measures and a value of 0 in rows that contain baseline measures.

Using this new ‘long’ format, estimate the same model, but interact your binary “Feelings of political efficacy” indicator with the Post indicator. Your model should take the following form:

\[ Future\_plans_{it} = \alpha + \beta_1 Efficacy_{it} + \beta_2 Post_{it} + \beta_3 (Efficacy_{it}*Post_{it}) + \epsilon_{it} \]

In your own words, interpret the meaning of \(\alpha\), \(\beta_1\), \(\beta_2\), and \(\beta_3\).

Fixed Effects 1  Fixed Effects 2
(Intercept) −0.455+ (0.259) −0.469+ (0.258)
z_efficacy 0.016 (0.023)
z_efficacy_bin −0.017 (0.030)
Num.Obs. 1633 1633
R2 Adj. 0.430 0.430